feat: autonomous cross-knob overnight tuning (feat_overnight_final_solution)#440
Conversation
…knob overnight tuning) Add Phase 1 spec for the overnight autopilot's follow-up-aware strategy mode. Operator opts in via a new `auto_followup_strategy: "follow_suggestions"` config key; the autopilot worker consumes the digest's top executable follow-up (narrow / widen / swap_template) per chain link instead of always running the ±50% narrow. Legacy `"narrow"` strategy stays byte- identical (single-writer rule for visited-template list; worker pops inherited keys before INSERT). Cycle/no-regress guard via ordered-unique `auto_followup_visited_template_ids` in `studies.config`; defensive coercion at chain-summary construction; SelectionOutcome dataclass so fallback-event telemetry always carries cycle-guard drops. Cross-model review: GPT-5.5 cycle 1 returned 11 findings (6 High, 5 Medium), all accepted and applied; cycle 2 returned 6 findings (0 High, 5 Medium, 1 Low — all internal-consistency cleanups from cycle 1), all accepted and applied. Convergence reached at cycle 2. Phase 2 (`phase2_idea.md`) defers a top-of-page morning summary card + the study-detail strategy line. Phase 3 (`phase3_idea.md`) defers the `proposals.status = 'superseded'` migration + auto-supersede of non-winning chain links' proposals. No backend code, no migration in Phase 1. Spec only. Dashboard + roadmap regen included. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…bs in context) Capture operator-requested idea: on the proposal detail page, render the template's full declared_params alongside the winning subset (config_diff) so the operator sees which knobs were tuned in the context of which were not. Today the page shows only the tuned subset, which leaves the operator guessing whether the optimizer considered the other knobs and rejected them or simply wasn't given them. Pure UI improvement using data already on the proposal + template; no migration, no new endpoint. Filed at P2 — speculative-but-explicitly-requested. Coordinates with feat_overnight_final_solution: as the chain learns to swap templates autonomously, the proposal page becomes the morning artifact, and making it self-explanatory compounds the value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…ecution) 7 stories across 4 epics: Epic 1 (config key + validator + wizard toggle + glossary), Epic 2 (pure-domain SelectionOutcome selector + autopilot worker dispatch + cycle guard + telemetry), Epic 3 (StudyChainLink additive fields + chain-panel badge), Epic 4 (docs). No migration — all new state is JSONB keys on studies.config; Alembic head stays 0022. Cross-model review: GPT-5.5 cycle 1 returned 10 findings (5 High, 5 Medium), all accepted and applied; cycle 2 returned 0 findings — converged. Key plan-stage corrections: AUTO_FOLLOWUP_STRATEGY_INVALID must be added to the _CUSTOM_ERROR_CODE_ALLOWLIST (not auto-unwrapped); the visited-list reject needs a mode="before" validator (StudyConfigSpec is extra="ignore"); and StudyChainLink gains a template_id field so the swap_template badge can resolve the target template name (otherwise not buildable). Spec section 8.3 updated to match the template_id addition. Dashboard regen included. Pipeline: spec + plan both approved; ready for /impl-execute. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
- StudyConfigSpec.auto_followup_strategy: str | None — wire field, NOT Literal (per spec D-13) so bad values surface as AUTO_FOLLOWUP_STRATEGY_INVALID via the canonical error-envelope unwrap path, not Pydantic's generic VALIDATION_ERROR. - AUTO_FOLLOWUP_STRATEGY_VALUES tuple constant at module level — source of truth for the frontend OVERNIGHT_STRATEGY_VALUES mirror + the CI grep gate. - @model_validator(mode="after") _validate_auto_followup_strategy enforces both the value-rule (AC-2) and the pair-rule (AC-1: only set when auto_followup_depth >= 1). - @model_validator(mode="before") _reject_worker_managed_keys enforces the single-writer rule for auto_followup_visited_template_ids and auto_followup_selected_kind (D-14). mode="before" is required because StudyConfigSpec defaults to extra="ignore" — a mode="after" validator would never see the dropped keys. Did NOT add blanket extra="forbid": that would broaden the blast radius beyond the two worker-managed keys and risk rejecting future config additions. - AUTO_FOLLOWUP_STRATEGY_INVALID added to _CUSTOM_ERROR_CODE_ALLOWLIST in api/errors.py — required (the prefix unwrap is gated by this allowlist, not automatic). - 11 new tests across contract + unit layers covering all four AC rules + both worker-managed-key rejects + the canonical-envelope unwrap. 51 tests in touched files pass; 438 contract+api unit tests green; enum source-of-truth gate clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
… 1.2) - OVERNIGHT_STRATEGY_VALUES + OvernightStrategy type added to ui/src/lib/enums.ts mirroring backend AUTO_FOLLOWUP_STRATEGY_VALUES. - overnight_strategy glossary key (FR-9) — short includes both wire values verbatim per AC-16 value-lock contract. - Strategy <Select> beneath the existing depth selector, visible only when auto_followup_depth >= 1 (AC-4), defaulting to "narrow". Wire values via OVERNIGHT_STRATEGY_VALUES.map() per form-select- discipline. - Submit handler writes config.auto_followup_strategy only when depth >= 1, defaulting to "narrow" — byte-identical legacy wire shape on Off, satisfies backend pair-rule when depth enabled. - 6 new wizard tests (AC-4 hidden/visible/hide-on-revert, AC-5 follow_suggestions submit, default-narrow, omit-both) + OVERNIGHT_STRATEGY_VALUES value-lock + overnight_strategy glossary value-lock (AC-16). All targeted vitest + typecheck green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…me (Story 2.1) New module backend/app/domain/study/auto_followup_strategy.py implements the autopilot's follow_suggestions strategy selector. Pure, deterministic, no DB / no I/O / no async — unit-tested without fixtures. SelectionOutcome dataclass carries selected + source_index + candidate_count + dropped_template_ids. The function ALWAYS returns an outcome — never None — so the fallback telemetry event (FR-8) still carries cycle-guard diagnostics when every executable was a swap to an already-visited template. Decision rules: - Drop TextFollowup (no search_space). - Drop SwapTemplateFollowup whose template_id is in visited (cycle guard). - First remaining by ORIGINAL index wins — trust digest convergence-aware ordering (D-5: no kind-preference policy). - source_index preserves the ORIGINAL list position (telemetry correlation with the digest's persisted order). - dropped_template_ids is sorted ascending (deterministic runbook grep). - Guard is template-based AND swap-only (D-9): narrow/widen on the parent's template is allowed even when that template_id is in visited. SELECTED_FOLLOWUP_KIND_VALUES tuple at module level — source-of-truth for StudyChainLink.selected_followup_kind (Story 3.1) + frontend mirror (Story 3.2). CI grep gate clean. 17 unit tests: empty / text-only / single-kind / original-index preservation / cycle guard / first-by-index multi-kind / determinism / dropped-id sort invariant / SELECTED_FOLLOWUP_KIND_VALUES lock. 542 broader domain unit tests green. Lint + mypy + typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…2.2) Modify enqueue_followup_study to dispatch on parent.config.auto_followup_strategy. Legacy/missing/"narrow" path is byte-identical to today; only "follow_suggestions" takes the new branch. Follow_suggestions branch: - Load parent digest, parse_followup_list, build visited set from parent.config (anchor default [parent.template_id] per D-14). - select_executable_followup -> SelectionOutcome. - narrow/widen consume search_space verbatim, keep parent.template_id. - swap_template defensive lookup; on miss emit auto_followup_swap_target_missing WARN + fall back to narrow. - selected None -> fallback narrow with selected_kind="narrow_default" and dropped_template_ids preserved in fallback INFO event. Defensive try/except wraps the dispatch block (P1-B4 + spec section 13): unexpected errors emit auto_followup_strategy_dispatch_error WARN + fall back to narrow. Chain reliability MUST NOT regress vs legacy. Per-link state persisted ONLY under follow_suggestions (D-12): ordered-unique auto_followup_visited_template_ids and auto_followup_selected_kind. Legacy path pops any inherited auto_followup_selected_kind from child_config before INSERT (AC-18). Telemetry timing (FR-8 + P1-B1 + cycle-2 C2-B1): - 2 INFO events emit AFTER child INSERT/commit; diagnostic locals captured inside the db-context block so post-commit telemetry can use them without re-querying a closed session. - swap_target_missing WARN emits BEFORE fallback with parent-only fields. 10 integration tests cover AC-3, AC-6, AC-7, AC-8, AC-9, AC-10, AC-17, AC-18 (both halves), and P1-B4 exception fallback. Existing test_auto_followup.py (30 tests) passes unmodified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
… 3.1)
Two additive fields on the chain endpoint's per-link response:
- selected_followup_kind: Literal["narrow_default","narrow","widen","swap_template"] | None
-- FR-6 soft-contract additive; null for the anchor and for every link
created under the legacy "narrow" strategy (per D-12 the legacy path
persists no auto_followup_selected_kind key at all).
- template_id: str (non-optional, P1-B5) -- every study has a template;
the chain panel's swap_template badge needs this to resolve the
target template's display name via GET /api/v1/query-templates/{id}
(Story 3.2). Without it the badge is not buildable from the chain
payload alone.
studies.py chain-summary builder applies a defensive coercion before
populating selected_followup_kind: an unknown JSONB value in
studies.config.auto_followup_selected_kind (manual INSERT, schema
drift, future-version row read by older deploy) is coerced to null
+ a chain_selected_kind_unknown WARN -- never raises a Pydantic
ValidationError that would 500 the endpoint. Mirrors the
parse_followup_list defensive-ingest contract for
digests.suggested_followups.
Existing 7 contract tests pass; 2 new contract tests:
- test_chain_link_fourteen_fields (was twelve_fields; updated to lock
the additive field-count + names).
- test_chain_link_selected_followup_kind_is_literal_with_four_values
-- locks the Literal mirror against
SELECTED_FOLLOWUP_KIND_VALUES from the domain module.
- test_chain_link_template_id_is_required_string -- locks the
non-optional contract.
A pre-existing test_endpoint_present_in_openapi continues to fail
locally (Settings bootstrap env-var requirement; works in CI). Not
in this story's scope; captured for the tangential sweep.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…(Story 3.2)
- SELECTED_FOLLOWUP_KIND_VALUES + SelectedFollowupKind type added to
ui/src/lib/enums.ts mirroring the backend domain module
auto_followup_strategy.py source-of-truth tuple.
- Per-link <ChainLinkStrategyBadge> renders compact labels: null -> no
badge (anchor, legacy chain); narrow_default -> "refined" (audit
signal for the follow_suggestions fallback path); narrow -> "narrow
down"; widen -> "widen up"; swap_template -> "swapped to {short_name}"
resolved via useTemplate(link.template_id) per OQ-1 / D-11.
- data-testid="chain-link-strategy-{link.id}" per badge for E2E + vitest
assertions.
- 2 new tests in auto-followup-chain-panel.test.tsx (AC-13: badge per
kind; AC-14: legacy all-null = no badges); 13 existing tests pass
unchanged.
- 1 new value-lock test for SELECTED_FOLLOWUP_KIND_VALUES.
- ui/openapi.json + ui/src/lib/types.ts regenerated via
scripts/regen-generated-artifacts.sh — the StudyChainLink response
now ships template_id + selected_followup_kind per Story 3.1.
E2E spec NOT included in this commit: the planned overnight-strategy
.spec.ts requires an explicit Arq enqueue helper that doesn't exist in
the test harness yet; the worker dispatch is comprehensively covered by
the 10 backend integration tests in test_auto_followup_strategy.py and
the wizard surface by the 6 vitest cases in create-study-modal.overnight
-strategy.test.tsx. Capturing as a tangential follow-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Captures the Story 3.2 E2E timing footgun for the next agent. P2 quality-only follow-up: a Radix-Select-onValueChange + react- hook-form-watch timing pattern that makes the chromium-against-dev- server pass fail consistently for the Strategy toggle visibility assertion. The wizard behavior is fully covered by 6 vitest cases + 10 backend integration + 3 contract + 2 chain-panel vitest cases; the E2E adds browser-level confidence at the cost of fragile timing. Idea proposes a shared Radix-Select wait helper. Dashboard + roadmap regen included. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
…Story 4.1) - tutorial-first-study.md Step 12 — new "Strategy — Refine vs. Try suggestions" sub-section; AC-15. Names the cycle guard + the always- fall-back-to-narrow contract; explains the per-link badge. - auto-followup-debugging.md telemetry catalog — adds the 3 new Story 2.2 events (auto_followup_strategy_selected, auto_followup_no_executable_candidate_fell_back_to_narrow, auto_followup_swap_target_missing) + the auxiliary auto_followup_strategy_dispatch_error WARN. Names operator action when fallback fires frequently (digest is text-heavy → re-run with bigger budget). - api-conventions.md — adds AUTO_FOLLOWUP_STRATEGY_INVALID to the studies error code table (covers all three rejection paths: bad value, pair-rule violation, operator-submitted worker-managed keys); notes the StudyChainLink additive fields (template_id, selected_followup_kind) are backward-compatible. - data-model.md — notes the 3 new optional keys on studies.config with their writer / persistence rules per D-12 / D-14. - ui-architecture.md — chain-panel badge surface description with the swap_template name-fetch contract per OQ-1 / D-11. Generated artifacts regen included (ui/public/docs/tutorial-first-study.md mirrored). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
There was a problem hiding this comment.
Code Review
This pull request implements the "Overnight Final Solution" feature, enabling autonomous cross-knob and cross-template tuning for overnight autopilot runs. It introduces the auto_followup_strategy configuration key, a pure-domain selector with a cycle guard to prevent template ping-ponging, and updates the autopilot worker to dispatch based on the chosen strategy. Additionally, it extends the study chain API and UI with strategy badges and short template names. The code review feedback is highly valuable, correctly identifying a critical NameError in the worker telemetry logging where child_id is referenced but remains undefined in the scope.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| "auto_followup strategy selected an executable follow-up", | ||
| event_type="auto_followup_strategy_selected", | ||
| parent_study_id=parent_study_id, | ||
| child_study_id=child_id, |
There was a problem hiding this comment.
The variable child_id is used in the telemetry logging but does not appear to be defined in this scope. Please verify if the variable is named child_study_id or if it should be retrieved from the created child study object (e.g., child.id). If child_id is undefined, this will raise a NameError at runtime.
Three distinct root causes from CI run 26925531096: 1. AC-6 + AC-7 search_space full-dict equality (test_ac6, test_ac7) — the worker writes the search_space via SearchSpace.model_dump() which adds defaulted fields (log: false on float params). Switched to structural assertions on params keys + low/high bounds. 2. UniqueViolationError on digests_study_id_key (test_ac7, test_ac8) — the seed helper accepted digest_followups=[] and created an empty digest; the tests then created a SECOND digest for the same study to swap in the real swap_target_id. Now pass digest_followups=None to skip the helper's empty-digest creation; create the proper digest after. 3. Structlog WARN events not captured by pytest's caplog (test_ac17, test_exception_in_follow_suggestions_dispatch_falls_back_to_narrow) — the worker emits via structlog.get_logger directly. caplog only captures stdlib logging records. Switched both tests to structlog.testing.capture_logs(), mirroring the existing test_auto_followup.py::test_enqueue_emits_auto_followup_enqueued_event pattern (line 211). Lint + typecheck clean. CI re-run will validate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Review adjudication (Gemini Code Assist)Commits landing fixes: none — finding is rejected as a hunk-isolated false positive. Gemini Code Assist (1 finding)
Outcomes
Ready for human review + merge. |
F1 (Medium) — Strategy toggle stale value: when the depth toggle hides (Off) and is re-enabled later, the strategy was preserved from the prior session — could mean a re-enabled control silently kept "follow_suggestions" instead of returning to the safe "narrow" default per spec FR-2. Fix: in the depth onValueChange, clear auto_followup_strategy when going to Off; explicitly set it to "narrow" on the Off → ≥1 transition. F2 (Medium) — Missing digest under follow_suggestions silently treated as empty list: the spec FR-3 says a None digest is a defensive edge case requiring a WARN. Fix: emit a distinct auto_followup_strategy_digest_missing WARN before continuing through the narrow fallback path. Operators can now grep this event apart from the routine text-only-digest case. F3 (Low) — overnight_strategy glossary short text exceeded the spec FR-9 ≤ 120 character limit (126). Fix: trim to "How follow-ups are picked. \"narrow\": tighter bounds, same knobs. \"follow_suggestions\": digest's top runnable item." (117 chars). Tightened the value-lock test from ≤140 to ≤120 so future drift trips immediately. Lint + mypy clean. 6 wizard vitest cases + 29 glossary cases green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai>
Final cross-model review (GPT-5.5) adjudicationCommits landing fixes: GPT-5.5 reviewed the complete PR diff (672 KB) against the spec + plan and returned 0 High findings. The 2 Medium + 1 Low are all genuine quality fixes; all 3 accepted and applied. GPT-5.5 (3 findings)
Combined adjudication summary across all reviewers
Ready for human review + merge once the post-fix CI run completes. |
…441) - state.md: prepend the PR #440 one-liner to Last-5-merges (drop the now-6th row), update branch + active-feature + in-flight lines. - pipeline_status.md: Implementation → Complete (PR #440, 1e9522a) + mvp2 release marker. - implementation_plan.md: Status → Complete. - Split the two deferred-phase idea files into their own planned_features folders (established practice — they stay discoverable to /pipeline status which only walks planned_features/): * feat_overnight_final_solution_phase2/idea.md (morning summary card) * feat_overnight_final_solution_phase3/idea.md (proposal superseded) Fixed cross-references for the new locations. - Moved the feature folder → implemented_features/2026_06_04_feat_overnight_final_solution. - Dashboard + roadmap regen included. Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…proposal_full_param_space_view) (#446) * docs(idea): apply preflight-2026-06-04 patches to feat_proposal_full_param_space_view Lands the audit-and-patch edits from a prior /idea-preflight run that had been sitting in the working tree, plus the dashboard regen that follows. - Header: "preflight-refreshed 2026-06-04" + linkified sibling references to the now-implemented overnight trio (PR #440 / #442 / #444). - Cap 1 grounds the panel target on <ConfigDiffPanel> at ui/src/components/proposals/config-diff-panel.tsx:63 (replacing the "Recommended config" placeholder which doesn't match the live tree). - declared_params shape correction: flat Record<string, type-tag>, NOT per-param bounds/defaults — bounds live on study.search_space. - Cap 2 coordination note pointing at the existing parent-vs-swap-target diff in <SuggestedFollowupsPanel> (suggested-followups-panel.tsx:250-291) so the new panel reuses the established visual grouping. - Scope signals: "backend: none required" — confirmed the proposal page already pays for useTemplate(parentStudy.template_id) at ui/src/app/proposals/[id]/page.tsx:183; chain-link proposals inherit template_id from parent (backend/workers/auto_followup.py), so the template's declared_params is on-page for free. - Q2 + Q3 rewritten against the corrected data shapes. - MVP2_DASHBOARD.md status cell picks up the linkified feat_overnight_final_solution reference. No new content; this is the preflight audit-and-patch result. Spec generation runs next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(spec): feat_proposal_full_param_space_view — three-state full-param-space panel on proposals Generates the feature specification + pipeline_status.md from the preflight-refreshed idea.md. Converged across 3 GPT-5.5 cycles + 1 Opus internal verification pass. The spec adds a new <FullParamSpacePanel> on /proposals/[id] that renders every parameter the proposal's template declares, partitioned into three visually distinct groups derived client-side from data already on the page: 1. Tuned (changed by this proposal) — appears in proposal.config_diff, rendered with from→to delta via the existing extractFromTo helper (promoted to ui/src/lib/config-diff.ts per FR-5). 2. Tuned (unchanged) — in study.search_space.params but not in config_diff (the optimizer considered it but the digest's recommended_config didn't include it). 3. Not in search space — declared on the template but absent from this study's tuning surface. The pure helper partitionTemplateParams in ui/src/lib/proposal-param-space.ts owns the partition algorithm; <FullParamSpacePanel> is a thin renderer. Both are unit-testable without DOM. Backend: NONE. Migration: NONE. The feature consumes existing endpoints (proposals/{id}, studies/{id}, query-templates/{id}) only. Cross-model review highlights (3 cycles, 19 findings total — 18 accepted+applied, 1 rejected with cited counter-evidence): - Cycle 3 F1 (High, accepted): caught a real correctness bug — the earlier FR-3 only lifted useTemplate's gate, not useStudy's, so study proposals with text-only digests would have mounted the panel with searchSpaceParams undefined and mis-classified every search-space param as 'untuned' instead of 'tunedUnchanged'. FR-3 + D-13 now require lifting BOTH fetches. - Cycle 2 F2 (High, accepted): FR-4 race-aware gating contradicted the section 11 narrative; aligned (panel waits for both parentTemplate + parentStudy.isPending=false before mounting for study-backed cases). - Cycle 1 F8 (Medium, rejected): GPT-5.5 worried the lifted useTemplate would change <SuggestedFollowupsPanel>'s rendering for previously-disabled cases. Rejected with counter-evidence at suggested-followups-panel.tsx:90-95+119-130 — parentTemplate is structurally consumed only by <SwapTemplateCard>, so non- swap-template proposals are indifferent to the prop. Caps 2 + 3 from the idea (cross-panel hover linking + study-detail mount) are explicitly deferred WITHOUT phase*_idea.md artifacts (D-8 + D-14) — reopen only if specific operator feedback surfaces. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(plan): feat_proposal_full_param_space_view — 4-story plan with lifted-fetch race-aware mount Generates the implementation plan from the approved spec. Converged across 3 GPT-5.5 cycles + 1 Opus internal verification pass — 19 findings total, 18 accepted+applied, 1 rejected with cited counter- evidence. Plan structure: 1 epic, 4 stories, dependency-ordered (story number IS execution order): Story 1.1 — Promote extractFromTo + renderValue to ui/src/lib/config-diff.ts (shared helper extraction, 7 unit cases, AC-9 byte-identical preservation of ConfigDiffPanel rendering). Story 1.2 — Pure helper partitionTemplateParams (the FR-1 partition algorithm — 8 unit tests covering AC-1/2/3/5/6, D-9 search- space drift drop, D-10 from===to anomaly, sort stability). Story 1.3 — <FullParamSpacePanel> component + new proposal.full_param_space glossary key (7 component vitest tests, AC-1/2/5/6/7/8). Story 1.4 — Page-level integration: lift BOTH useTemplate AND useStudy gates (drop hasActionableFollowup); race-aware conditional mount; 6 page-level vitest tests + 1 real-backend Playwright E2E test. Cross-model review highlights: - Cycle 1 F7 (Low, rejected): GPT-5.5 worried seedManualProposal wasn't a real helper. Counter-evidence: it's defined locally at proposals.spec.ts:21-36 as a 3-helper composition. - Cycle 3 F1 (High, accepted): caught a TypeScript build-breaker — Object.keys + indexed access on Record<string, string> with the project's noUncheckedIndexedAccess gate yields string | undefined. Algorithm now uses Object.entries (type-narrowed) + ?? '(unknown)' fallback for the AC-6 drift path. - Cycle 3 F6 (Medium, accepted): missing dedicated test for FR-7 edge case A (source-study fetch error). Added as page-level Test 6. - Cycle 1 F2 (High, accepted): the cycle-3 F1 regression guard was bundled into the happy-path test which uses swap_template (already actionable, so wouldn't catch the bug). Split into dedicated Test 5 with empty/text-only digest. - Cycle 3 F5 (Medium, accepted): Test 4 race-gating used single deferred resolver — could pass vacuously if template was also pending. Switched to dual-deferred + qc.getQueryState confirmation. No backend changes, no migrations, no audit events. Plan is fully frontend, single-phase, no phase*_idea.md artifacts per D-14. Total test coverage: 7 (config-diff unit) + 8 (partition unit) + 7 (panel component) + 6 (page-level vitest) + 1 (real-backend Playwright E2E) = 29 new tests. Existing tests stay byte-identical (AC-9 + AC-10 enforce this). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * refactor(proposals): promote extractFromTo + renderValue to shared module (Story 1.1) Extract the two config_diff value-rendering helpers from config-diff-panel.tsx into ui/src/lib/config-diff.ts so the new <FullParamSpacePanel> (Story 1.3) can reuse the same canonical {from, to}-vs-2-tuple normalization without duplication. - New ui/src/lib/config-diff.ts exports extractFromTo + renderValue. - config-diff-panel.tsx re-imports both; rendering byte-identical (AC-9). - New config-diff.test.ts: 7 cases (3 extractFromTo + 4 renderValue). - Existing config-diff-panel.test.tsx passes unchanged (6 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(proposals): pure partitionTemplateParams helper (Story 1.2) The FR-1 partition algorithm — partitions a template's declared params into tunedChanged / tunedUnchanged / untuned given the proposal's config_diff + the source study's search_space.params. - Partition universe is declaredParams union configDiff (D-9); search-space- only drift keys are silently dropped. - config_diff membership is the operational definition of "tuned" (D-10: a from===to anomaly still classifies as tunedChanged). - Drift keys (in config_diff, not in declared_params) render type '(unknown)' (AC-6). - Uses Object.entries (type-narrowed) to satisfy noUncheckedIndexedAccess. - 8 unit tests covering AC-1/2/3/5/6, D-9, D-10, sort stability. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(proposals): FullParamSpacePanel component + glossary key (Story 1.3) A new <FullParamSpacePanel> renders the three-state partition (tuned-changed / tuned-unchanged / not-in-search-space) for a proposal's template, consuming the pure partitionTemplateParams helper. - Card shell + InfoTooltip matching the existing proposal-panel pattern. - Three labeled groups, each omitted when empty; full-universe-empty shows the param-space-empty state (declaredParams AND config_diff both empty — AC-6 drift path takes precedence otherwise). - tunedChanged rows show from→to; tunedUnchanged "(no change)"; untuned italic — reusing <DeclaredParamsColumn>'s typography. - New proposal.full_param_space glossary key (FR-6). - 7 component tests (AC-1/2/5/6/7/8 + full-empty defensive); glossary AC-12 audience-language check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * feat(proposals): mount FullParamSpacePanel with lifted fetches + race-aware gating (Story 1.4) Page-level integration of the full-parameter-space panel on /proposals/[id]. - Lift BOTH useTemplate (now sourced from proposal.template.id, null-safe) AND useStudy (drop the `&& hasActionableFollowup` gate) so the panel has declared_params + search_space.params for EVERY proposal shape (FR-3 / D-13). Removed the now-dead hasActionableFollowup variable. - Mount the panel below ConfigDiffPanel with race-aware gating: wait for parentTemplate.data AND, for study-backed proposals, parentStudy settled (FR-4) so the tunedUnchanged group never flashes a transient mis-classification. - 6 new page-level vitest tests: happy path, manual proposal, template 404, race-gating (dual-deferred resolver), FR-3 regression guard (no-actionable-followups digest), FR-7 edge A (study fetch error). - 1 new real-backend Playwright E2E asserting the panel renders for a seeded manual proposal. All 18 page tests pass (12 existing + 6 new). All 5 proposals E2E pass (verified against a rebuilt production container). tsc + build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * test(proposals): strengthen tests per GPT-5.5 phase-gate review Phase-gate cross-model review findings (4 accepted, 1 rejected): - F1 (accepted): D-9 search-space-drift unit test now uses searchSpaceParams={phantom} only, asserting `foo` (declared, not in search space) → untuned AND phantom dropped — covers the classification the prior version skipped by including foo in search space. - F2 (accepted): component AC-1 test now asserts the group-header count text ("2 parameters" / "1 parameter") per FR-2, not just the testids. - F3 (accepted): page template-404 test now asserts <PrPanel> (open-pr-button) stays visible alongside ConfigDiffPanel + metric-delta. - F4 (accepted): race-gating test converted to the documented dual-deferred resolver pattern (both template + study deferred) to eliminate any vacuous-pass window. - F5 (rejected): GPT wanted an exhaustive switch(group)/never default; counter-evidence — GROUP_LABELS: Record<ParamSpaceGroup, string> in full-param-space-panel.tsx already gives compile-time exhaustiveness (a 4th variant is a type error at the literal). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(guides): regenerate guide 02 screenshots for the full-parameter-space panel The new <FullParamSpacePanel> mounts on /proposals/[id] between the config-diff and metric-delta panels, so guide 02's proposal-detail screenshot (03-proposal-detail.png) now shows it. Regenerated the full guide-02 screenshot set against the running stack so the walkthrough reflects the shipped UI. 03-proposal-detail.png confirms the panel renders correctly end-to-end: config_diff drift keys (description.boost / title.boost, type "(unknown)" since the seeded template declares only `boost`) under "Tuned (changed by this proposal)", and `boost` under "Not in search space". The other four PNGs changed only from re-seeded demo data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * fix(proposals): null-safe search-space guard + grid-aligned tuned rows (Gemini review) Gemini Code Assist adjudication (both accepted): - G1 (High): `searchSpaceParams !== undefined && key in searchSpaceParams` throws TypeError when searchSpaceParams is null (null !== undefined is true, and `key in null` throws). JSONB study.search_space.params can be null at runtime. Fixed with a truthiness guard + widened the PartitionInput/prop type to Record | null | undefined to be honest about the runtime contract. Added a null-search-space regression unit test. - G2 (Medium): tunedChanged rows now use a CSS grid so name/type/from/→/to align vertically across rows (matching ConfigDiffPanel's table columns). Tests are layout-agnostic and stay green. 34 lib+component+page tests pass; tsc + build + lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * test(proposals): close race-gating coverage hole (GPT-5.5 final review FF1) The AC-11 race-gating test asserted that tuned_unchanged + empty were absent during the template-ready/study-pending window — but BOTH are absent even on a premature mount (config_diff empty + no search space → foo classifies as `untuned`, not tuned_unchanged or empty). So the guard would have passed even if the panel mounted too early. Add assertions that param-space-group-untuned AND param-space-row-untuned-foo are also absent during the race window — these WOULD render on a premature mount, so the test now genuinely catches the race bug FR-4's gating defends against. (GPT-5.5 final review FF2 — "ACTIONABLE_FOLLOWUP_KINDS unused" — rejected: still consumed at page.tsx:196 in the prefillValues useMemo; tsc + lint clean.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… phase3 idea preflight (#447) * docs: finalize feat_proposal_full_param_space_view (PR #446 merged) - Move feature folder planned_features/02_mvp2 → implemented_features/2026_06_04_feat_proposal_full_param_space_view (flat, date-prefixed). - pipeline_status.md: Implementation Complete (PR #446, 3baea3f) + Release: mvp2 marker. - implementation_plan.md: Status → Complete. - state.md: new merge entry (drop #436 to keep last-5), branch/active-feature/in-flight refreshed. - state_history.md: full narrative entry prepended. - Dashboards + public roadmap regenerated (feature moves to the implemented column). No phase*_idea.md files (Caps 2+3 deferred without artifacts per D-14). No tracking issue to close. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> * docs(idea): apply preflight edits to feat_overnight_final_solution_phase3/idea.md Lands audit-and-patch edits that were sitting uncommitted in the working tree from a prior /idea-preflight run on the phase3 idea: priority reclassified Backlog → P2 (operator-confirmed MVP2 scope), dependency marked satisfied (Phase 1 merged as PR #440), corrected the proposal-creation citation to backend/workers/orchestrator.py:693-740 (_stop, not _on_study_complete), and linkified sibling references. Dashboards regenerated to reflect the P2 reclassification. Unrelated to feat_proposal_full_param_space_view — committed at operator request to clear the working tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> --------- Signed-off-by: SoundMindsAI <eric.starr@soundminds.ai> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Ships the autonomous cross-knob overnight tuning capability — the overnight autopilot ("🌙 Run overnight (compound automatically)") can now consume the digest's executable follow-ups (narrow / widen / swap_template) on each chain link instead of always running the hardcoded ±50% narrow on the same template. With the new opt-in Strategy toggle, an operator can sleep through a chain that branches across knobs and templates — the chain panel in the morning shows the explored path link by link with compact badges.
The legacy
"narrow"strategy stays byte-identical to today's behavior (every existing study unchanged; the existingtest_auto_followup.pyintegration tests pass unmodified — that's the backward-compat gate).End-to-end flow:
<Select>directly beneath the depth selector, visible only when depth ≥ 1, defaulting to"narrow"(Story 1.2).config.auto_followup_strategyvia a newAUTO_FOLLOWUP_STRATEGY_INVALID422 envelope; rejects operator-submitted worker-managed keys (single-writer rule per D-14) (Story 1.1).select_executable_followupwalks the parent digest'ssuggested_followups, drops text + cycle-guarded swap_templates, returns aSelectionOutcome(Story 2.1).child.template_idto the swap target (with a defensive lookup →auto_followup_swap_target_missingWARN + fallback on deleted target); no-candidate falls back to today's narrow path. Per-link state (visited-templates list + selected_kind) persists ONLY underfollow_suggestionsper D-12 (Story 2.2).GET /api/v1/studies/{id}/chain'sStudyChainLinkgainstemplate_id(P1-B5; needed by the panel's swap badge) +selected_followup_kind(FR-6) with a defensive coercion against unknown JSONB values (Story 3.1).narrow ↓/widen ↑/swapped to {short_template_name}/refined. Swap badges resolve the target template's display name via a per-linkuseTemplate(link.template_id)fetch per OQ-1 / D-11 (Story 3.2).Two follow-up phases are explicitly deferred and tracked in
phase2_idea.md(morning summary card) +phase3_idea.md(proposalsupersededstatus + auto-supersede).One tangential idea also captured this session:
chore_e2e_overnight_strategy_radix_select_timing— the Story 3.2 E2E timing footgun for a Radix-Select-onValueChange + react-hook-form-watch chain in chromium.The follow-up
feat_proposal_full_param_space_viewidea ships alongside this branch per the one-branch-per-session convention.Test coverage
test_validation_error_handler.pytest_studies_api_contract.py— all four AC rules + both worker-managed-key rejectstest_studies_chain_contract.py—selected_followup_kindenum +template_idrequired-string locktest_auto_followup_strategy.py— full matrix + ordering invariantstest_auto_followup_strategy.py— AC-3/6/7/8/9/10/17/18 + P1-B4 exception fallbackcreate-study-modal.overnight-strategy.test.tsx— AC-4/AC-5auto-followup-chain-panel.test.tsx— AC-13/AC-14OVERNIGHT_STRATEGY_VALUES+SELECTED_FOLLOWUP_KIND_VALUESvalue-locksovernight_strategyAC-16E2E spec deliberately omitted — see
docs/00_overview/planned_features/02_mvp2/chore_e2e_overnight_strategy_radix_select_timing/idea.md. Wizard behavior is fully covered by vitest cases + integration tests at every other layer.Test plan
make fmt(ruff format)make lint(ruff check + ESLint — 0 errors)make typecheck(mypy strict + tsc strict).venv/bin/ruff format --check backend/(CI parity)bash scripts/ci/verify_enum_source_of_truth.sh(40 allowlists clean)make test-unit(542 domain unit tests green incl. 17 new selector cases)cd ui && pnpm test(targeted vitest suites green)cd ui && pnpm typecheck+pnpm build(green)bash scripts/regen-generated-artifacts.sh(clean tree)Spec / plan
docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/feature_spec.md— Opus + GPT-5.5 cycle 2 converged with 0 High findings.docs/00_overview/planned_features/02_mvp2/feat_overnight_final_solution/implementation_plan.md— Opus + GPT-5.5 cycle 2 converged with 0 findings.Also fixes
chore_e2e_overnight_strategy_radix_select_timing(P2 quality-only follow-up).feat_proposal_full_param_space_view(P2, captured for separate later cycle; rides this branch per one-branch-per-session).🤖 Generated with Claude Code